NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

LLMs in Action: Robust Metrics for Evaluating Automated Ontology Annotation Systems

https://doi.org/10.3390/info16030225

Noori, Ali; Devkota, Pratik; Mohanty, Somya D; Manda, Prashanti (March 2025, Information)

Ontologies are critical for organizing and interpreting complex domain-specific knowledge, with applications in data integration, functional prediction, and knowledge discovery. As the manual curation of ontology annotations becomes increasingly infeasible due to the exponential growth of biomedical and genomic data, natural language processing (NLP)-based systems have emerged as scalable alternatives. Evaluating these systems requires robust semantic similarity metrics that account for hierarchical and partially correct relationships often present in ontology annotations. This study explores the integration of graph-based and language-based embeddings to enhance the performance of semantic similarity metrics. Combining embeddings generated via Node2Vec and large language models (LLMs) with traditional semantic similarity metrics, we demonstrate that hybrid approaches effectively capture both structural and semantic relationships within ontologies. Our results show that combined similarity metrics outperform individual metrics, achieving high accuracy in distinguishing child–parent pairs from random pairs. This work underscores the importance of robust semantic similarity metrics for evaluating and optimizing NLP-based ontology annotation systems. Future research should explore the real-time integration of these metrics and advanced neural architectures to further enhance scalability and accuracy, advancing ontology-driven analyses in biomedical research and beyond.
more » « less
Free, publicly-accessible full text available March 1, 2026
Leveraging Large Language Models and RNNs for Accurate Ontology-Based Text Annotation [Leveraging Large Language Models and RNNs for Accurate Ontology-Based Text Annotation]

https://doi.org/10.5220/0013267100003911

Devkota, Pratik; Mohanty, Somya; Manda, Prashanti (January 2025, SCITEPRESS - Science and Technology Publications)

Full Text Available
Improving the Evaluation of NLP Approaches for Scientific Text Annotation with Ontology Embedding-Based Semantic Similarity Metrics

Devkota, Pratik; Mohanty, Somya; Manda, Prashanti (December 2023, ACL Anthology)

Full Text Available
Using ontology embeddings with deep learning architectures to improve prediction of ontology concepts from literature

Devkota, Pratik; Mohanty, Somya; Manda, Prashanti (September 2023, CEUR workshop proceedings)

Full Text Available
A deep semantic matching approach for identifying relevant messages for social media analysis

https://doi.org/10.1038/s41598-023-38761-y

Biggers, Frederick Brown; Mohanty, Somya D.; Manda, Prashanti (July 2023, Scientific Reports)

Abstract There is a growing interest in using social media content for Natural Language Processing applications. However, it is not easy to computationally identify the most relevant set of tweets related to any specific event. Challenging semantics coupled with different ways for using natural language in social media make it difficult for retrieving the most relevant set of data from any social media outlet. This paper seeks to demonstrate a way to present the changing semantics of Twitter within the context of a crisis event, specifically tweets during Hurricane Irma. These methods can be used to identify the most relevant corpus of text for analysis in relevance to a specific incident such as a hurricane. Using an implementation of the Word2Vec method of Neural Network training mechanisms to create Word Embeddings, this paper will: discuss how the relative meaning of words changes as events unfold; present a mechanism for scoring tweets based upon dynamic, relative context relatedness; and show that similarity between words is not necessarily static. We present different methods for training the vector model in Word2Vec for identification of the most relevant tweets for any search query. The impact of tuning parameters such as Word Window Size, Minimum Word Frequency, Hidden Layer Dimensionality, and Negative Sampling on model performance was explored. The window containing the local maximum for AU_ROC for each parameter serves as a guide for other studies using the methods presented here for social media data analysis.
more » « less
Ontology-Powered Boosting for Improved Recognition of Ontology Concepts from Biological Literature [Ontology-Powered Boosting for Improved Recognition of Ontology Concepts from Biological Literature]

https://doi.org/10.5220/0011683200003414

Devkota, Pratik; Mohanty, Somya; Manda, Prashanti (January 2023, 16th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2023))

Full Text Available
Knowledge of the Ancestors: Intelligent Ontology-aware Annotation of Biological Literature using Semantic Similarity

Devkota, Pratik; Mohanty, Somya; Manda, Prashanti (September 2022, International COnference on Biomedical Ontology)

Full Text Available
A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature

https://doi.org/10.1186/s13040-022-00310-0

Devkota, Pratik; Mohanty, Somya D.; Manda, Prashanti (September 2022, BioData Mining)

Abstract BackgroundAnnotating scientific literature with ontology concepts is a critical task in biology and several other domains for knowledge discovery. Ontology based annotations can power large-scale comparative analyses in a wide range of applications ranging from evolutionary phenotypes to rare human diseases to the study of protein functions. Computational methods that can tag scientific text with ontology terms have included lexical/syntactic methods, traditional machine learning, and most recently, deep learning. ResultsHere, we present state of the art deep learning architectures based on Gated Recurrent Units for annotating text with ontology concepts. We use the Colorado Richly Annotated Full Text Corpus (CRAFT) as a gold standard for training and testing. We explore a number of additional information sources including NCBI’s BioThesauraus and Unified Medical Language System (UMLS) to augment information from CRAFT for increasing prediction accuracy. Our best model results in a 0.84 F1 and semantic similarity. ConclusionThe results shown here underscore the impact for using deep learning architectures for automatically recognizing ontology concepts from literature. The augmentation of the models with biological information beyond that present in the gold standard corpus shows a distinct improvement in prediction accuracy.
more » « less
Knowledge of the Ancestors: Intelligent Ontology-aware Annotation of Biological Literature using Semantic Similarity

Devkota; Pratik; Mohanty, Somya; Manda, Prashanti (January 2022, International Conference on Biomedical Ontology)

Full Text Available
Automated ontology-based annotation of scientific literature using deep learning

https://doi.org/10.1145/3391274.3393636

Manda, Prashanti; SayedAhmed, Saed; Mohanty, Somya D. (July 2020, Proceedings of The International Workshop on Semantic Big Data)
null (Ed.)
Full Text Available

Search for: All records